Raed Al-falahy

Healthcare Data Visualization a Case Study of Cancer/Malignant Neoplasms

Preparing the dataset for analysis by loading and exploring its contents

Malignant neoplasms

Malignant neoplasms of colon

Malignant neoplasms of lung

Malignant neoplasms of prostate (Male Cancer)

The output of the code is a heatmap that displays the correlation between different types of cancer. Each cell in the heatmap represents the correlation between two types of cancer or the same cancer type. The color of each cell represents the strength and direction of the correlation. A correlation coefficient (r) ranges between -1 and 1.

Positive correlation (r > 0): As one variable increases, the other variable also increases. The strength of the positive correlation increases as the value approaches 1. In the heatmap, these correlations are represented by warmer colors (reds and oranges).

Negative correlation (r < 0): As one variable increases, the other variable decreases. The strength of the negative correlation increases as the value approaches -1. In the heatmap, these correlations are represented by cooler colors (blues and greens).

No correlation (r = 0): There is no relationship between the two variables. In the heatmap, this is represented by a neutral color (white).

The diagonal line of cells with a correlation coefficient of 1 are the correlations of a variable with itself. For example, the correlation between "MN of colon - Males" and "MN of colon - Males" is 1 because they are the same variable.

To explain this to your students, you can say that the heatmap shows the relationship between different types of cancer. Warmer colors represent a positive correlation, meaning that when the incidence rate of one cancer type increases, the other cancer type's incidence rate also increases. Cooler colors represent a negative correlation, meaning that when the incidence rate of one cancer type increases, the other cancer type's incidence rate decreases. Neutral colors (white) represent no correlation between the two cancer types.

It is important to note that correlation does not imply causation. A correlation between two variables indicates a relationship, but it does not necessarily mean that one variable causes the other.

The output plot is a pair plot, which is a matrix of scatter plots that helps visualize the relationships between different variables. In this case, the variables are the mean incidence rates of different types of cancer in various countries. The primary goal of this plot is to explore the correlation between the different types of cancer and identify any trends or patterns in the data.

Here's an explanation of the plot that you can provide to your student:

  1. Each scatter plot within the matrix represents a pair of cancer types, with one type plotted on the x-axis and the other on the y-axis. Each point on a scatter plot represents a country, with its mean incidence rate for the two types of cancer.

  2. The diagonal plots are kernel density estimates (KDE), which provide a smoothed, continuous visualization of the distribution of the mean incidence rates for each cancer type across the countries. These plots can help identify the general shape of the distribution, such as whether it is unimodal, bimodal, or skewed.

  3. If there's a strong positive correlation between two types of cancer, the points in the scatter plot will form an upward-sloping pattern, indicating that as the incidence rate of one cancer type increases, the other cancer type's incidence rate tends to increase as well. Conversely, a strong negative correlation will have the points forming a downward-sloping pattern, meaning that as the incidence rate of one cancer type increases, the other cancer type's incidence rate tends to decrease. If there's little to no correlation, the points will be scattered with no clear pattern.

  4. In addition to identifying correlations, the pair plot can also reveal any potential outliers or unusual data points. Outliers may indicate issues with the data or unique situations in specific countries that warrant further investigation.

Encourage your student to examine the pair plot and identify which pairs of cancer types have strong correlations, weak correlations, or no correlations. Additionally, discussing the KDE plots and any noticeable outliers can lead to a deeper understanding of the data and the relationships between different cancer types.